NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Remote Memory Prefetching: Is Coarse-grained Fine?

https://doi.org/10.1145/3680256.3721318

McMahon, James; Pawar, Vinita; Stutsman, Ryan (May 2025, ACM)

Full Text Available
ObjecTier: Non-Invasively Boosting Memory Tiering Performance

https://doi.org/10.1145/3680256.3721319

Pawar, Vinita; Bhardwaj, Ankit; Stutsman, Ryan (May 2025, ACM)

Full Text Available
Stop Taking the Scenic Route: the Shortest Distance Between the CPU and the NIC is MMIO

https://doi.org/10.1145/3713082.3730389

Liew, Wei Siew; Rahaman, Md Ashfaqur; McMahon, James; Stutsman, Ryan; Nagarajan, Vijay (May 2025, ACM)

Full Text Available
Efficient linearizability checking for actor‐based systems

https://doi.org/10.1002/spe.3251

Al‐Mahfoudh, Mohammed S; Stutsman, Ryan; Gopalakrishnan, Ganesh (November 2023, Software: Practice and Experience)

Abstract Recent demand for distributed software had led to a surge in popularity in actor‐based frameworks. However, even with the stylized message passing model of actors, writing correct distributed software is still difficult. We present our work on linearizability checking in DS2, an integrated framework for specifying, synthesizing, and testing distributed actor systems. The key insight of our approach is that often subcomponents of distributed actor systems represent common algorithms or data structures (e.g., a distributed hash table or tree) that can be validated against a simple sequential model of the system. This makes it easy for developers to validate their concurrent actor systems without complex specifications. DS2 automatically explores the concurrent schedules that system could arrive at, and it compares observed output of the system to ensure it is equivalent to what the sequential implementation could have produced. We describe DS2's linearizability checking and test it on several concurrent replication algorithms from the literature. We explore in detail how different algorithms for enumerating the model schedule space fare in finding bugs in actor systems, and we present our own refinements on algorithms for exploring actor system schedules that we show are effective in finding bugs.
more » « less
Full Text Available
BPF-oF: Storage Function Pushdown Over the Network

Zarkadas, Ioannis; Zussman, Tal; Carin, Jeremy; Jiang, Sheng; Zhong, Yuhong; Pfefferle, Jonas; Franke, Hubertus; Yang, Junfeng; Kaffes, Kostis; Stutsman, Ryan; et al (September 2023, Arxiv)

Full Text Available
Sharding the State Machine: Automated Modular Reasoning for Complex Concurrent Systems

Hance, Travis; Zhou, Yi; Lattuada, Andrea; Achermann, Reto; Conway, Alex; Stutsman, Ryan; Zellweger, Gerd; Hawblitzel, Chris; Howell, Jon; Parno, Bryan (July 2023, Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI))

We present IronSync, an automated verification framework for concurrent code with shared memory. IronSync scales to complex systems by splitting system-wide proofs into isolated concerns such that each can be substantially automated. As a starting point, IronSync’s ownership type system allows a developer to straightforwardly prove both data safety and the logical correctness of thread-local operations. IronSync then introduces the concept of a Localized Transition System, which connects the correctness of local actions to the correctness of the entire system. We demonstrate IronSync by verifying two state-of-the-art concurrent systems comprising thousands of lines: a library for black-box replication on NUMA architectures, and a highly concurrent page cache.
more » « less
Full Text Available
Cache-coherent accelerators for persistent memory crash consistency

https://doi.org/10.1145/3538643.3539752

Bhardwaj, Ankit; Thornley, Todd; Pawar, Vinita; Achermann, Reto; Zellweger, Gerd; Stutsman, Ryan (June 2022, Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems)

Building persistent memory (PM) data structures is difficult because crashes interrupt operations, leaving data structures in an inconsistent state. Solving this requires augmenting code that modifies PM state to ensure that interrupted operations can be completed or undone. Today, this is done using careful, hand-crafted code, a compiler pass, or page faults. We propose a new, easy way to transform volatile data structure code to work with PM that uses a cache-coherent accelerator to do this augmentation, and we show that it may outperform existing approaches for building PM structures.
more » « less
Full Text Available
Achieving high throughput and elasticity in a larger-than-memory store

https://doi.org/10.14778/3457390.3457406

Kulkarni, Chinmay; Chandramouli, Badrish; Stutsman, Ryan (April 2021, Proceedings of the VLDB Endowment)

Millions of sensors, mobile applications and machines now generate billions of events. Specialized many-core key-value stores (KVSs) can ingest and index these events at high rates (over 100 Mops/s on one machine) if events are generated on the same machine; however, to be practical and cost-effective they must ingest events over the network and scale across cloud resources elastically. We present Shadowfax, a new distributed KVS based on FASTER, that transparently spans DRAM, SSDs, and cloud blob storage while serving 130 Mops/s/VM over commodity Azure VMs using conventional Linux TCP. Beyond high single-VM performance, Shadowfax uses a unique approach to distributed reconfiguration that avoids any server-side key ownership checks or cross-core coordination both during normal operation and migration. Hence, Shadowfax can shift load in 17 s to improve system throughput by 10 Mops/s with little disruption. Compared to the state-of-the-art, it has 8x better throughput (than Seastar+memcached) and avoids costly I/O to move cold data during migration. On 12 machines, Shadowfax retains its high throughput to perform 930 Mops/s, which, to the best of our knowledge, is the highest reported throughput for a distributed KVS used for large-scale data ingestion and indexing.
more » « less
Full Text Available
XRP: In-Kernel Storage Functions with eBPF

Zhong, Yuhong; Li, Haoyu; Wu, Yu Jian; Zarkadas, Ioannis; Tao, Jeffrey; Mesterhazy, Evan; Makris, Michael; Yang, Junfeng; Tai, Amy; Stutsman, Ryan; et al (July 2022, Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation)

With the emergence of microsecond-scale NVMe storage devices, the Linux kernel storage stack overhead has become significant, almost doubling access times. We present XRP, a framework that allows applications to execute user-defined storage functions, such as index lookups or aggregations, from an eBPF hook in the NVMe driver, safely bypassing most of the kernel’s storage stack. To preserve file system semantics, XRP propagates a small amount of kernel state to its NVMe driver hook where the user-registered eBPF functions are called. We show how two key-value stores, BPF-KV, a simple B+-tree key-value store, and WiredTiger, a popular log-structured merge tree storage engine, can leverage XRP to significantly improve throughput and latency.
more » « less
Full Text Available
XRP: In-Kernel Storage Functions with eBPF

Zhong, Yuhong; Li, Haoyu; Wu, Yu Jian; Zarkadas, Ioannis; Tao, Jeffrey; Mesterhazy, Evan; Makris, Michael; Yang, Junfeng; Tai, Amy; Stutsman, Ryan; et al (July 2022, 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22))

Full Text Available

« Prev Next »

Search for: All records